10 



used as the trained patterns to be selected. When the trained patterns of 
category 3 are selected in step S405, average values 313 of utterance by- 
training speakers in category 3 and covariance values 323 of utterances by 
the training speakers in category 3 are used as the trained patterns to be 
5 selected. Now, pattern by-characteristic selection unit 204 finishes its 
process. 

Speaker adaptation processor 205 distorts a spectral frequency on an 
LPC cepstral coefficient vector by Oppenheim method equation (2) using a 
first utterance part of the vector of the input speech sound, where the vector 

10 has been already calculated by acoustic analysis unit 202. The Oppenheim 
method is also disclosed in Oppenheim, A.V. and Johnson, D.H. "Discrete 
Representation of Signals," Proc. IEEE 60 (6): 681-691 (1972). 

A distance measure of the utterance is calculated between the LPC 
cepstral coefficient vector, of which spectral frequency has been distorted, and 

15 a pattern arrangement corresponding to the vocabulary indicating the 
controlled device. The pattern arrangement has been generated using the 
trained patterns determined by pattern selection unit 204. In other words, 

LPC cepstral coefficient vector X a is obtained by distorting input LPC 

cepstral coefficient vector X through a filter shown by equation (2) using a 
20 frequency distortion coefficient CC . The frequency distortion coefficient 

providing the most similar distance of all vector X a is determined according 
to equation (3) in relation to LPC cepstral coefficient vector X a . 



-1 




Z 



-a 



1-az 



-l 



(2) 



where; 



25 



Of is a vocal tract length normalization coefficient (frequency 



distortion coefficient). 



«= arg max P(X a \ a, 0) (3) 



a 

where; 

jP is a probability (similarity), 
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CC is a vocal tract length normalization coefficient (frequency 
distortion coefficient), 

x is an LPC cepstral coefficient vector, and 
0 is a trained pattern. 

5 

A process by speaker adaptation processor 205 will be hereinafter 
described using a flow chart shown in Fig. 5. 

Three initial values (ctdef - Aai, ctdef, cxdef + Aai) of distortion coefficients 
of spectral frequency of a calculated object are firstly set (step S501). 

10 Preferably, ctdef is 0.20 to 0.50 and Aai is 0.005 to 0.100 when a sampling 
frequency of speech sounds is 10kHz, and the present embodiment employs 
ctdef - 0.35 and Aai = 0.02. 

Speaker adaptation processor 205 then calculates three sets of LPC 
cepstral coefficient vectors using spectral frequency distortioncalculation. 

1 5 (step S502). In thiscalculation, the processor 205 passes the first utterance 
part of the LPC cepstral coefficient vector of user's utterance through the 
following filter to distort the spectrum on the LPC cepstral coefficient vector 
(hereinafter called a spectral frequency distortioncalculation.). The. filter is 
represented by equation (2 ) using the spectral frequency distortion 

20 coefficients set in step S501. The LPC cepstral coefficients of user's 
utterance have been already obtained by acoustic analysis unit 202. 

Next, speaker adaptation processor 205 stores the trained patterns 
that are determined by pattern selection unit 204 and the recognition result 
of the vocabulary indicating a controlled device (step S503). 

25 Next, processor 205 calculates distances between three sets of LPC 

cepstral coefficient vectors determined in step S502 and a pattern 
arrangement formed using the trained patterns that are determined in step 
S503, based on the recognition result obtained in step S503 (step S504). 
When the device selection word determined by pattern selection unit 204 is 

30 "Television" and the trained pattern belongs to category 2, the simplified 
Mahalanobis' distance L is described every LPC cepstral coefficient as 
follows;. 

L 51k= 2 B 5ki " 2A 5k i * X l 
iytime 

where; 
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B 5k = A*5Jfc ' A* ~ #c ' fi x , 

jj,^ is an average value of LPC cepstral coefficient vectors of state 

(k) (phoneme order or time sequence) of an arrangement "Television" 
of speech sound elements for category 2, 

\l x is an average value of LPC cepstral coefficient vectors of all 
utterances by training speakers in category % 

w is a covariance value of LPC cepstral coefficient vectors of all 
utterances by the training speakers in category 2, and 

X^ is an LPC cepstral coefficient vector when the spectral frequency 
distortion coefficient is 0.33., 

L 52k = 2 B 5k t - 2 ^ki '*2 

i,time 

where; 

is an LPC cepstral coefficient vector when the spectral 
frequency distortion coefficient is 0.35. 

L 53k= 2 B 5k t ~ 2 ^5ki '%3 

i,time 

where; 

X^ is an LPC cepstral coefficient vector when the spectral 
frequency distortion coefficient is 0.37. 

Next, speaker adaptation processor 205 discriminates and determines 
a spectral frequency distortion coefficient when the most similar, namely the 

nearest, distance is obtained among the distances L$\k L , 52k ^53k obtained 



